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Abstract 

We develop the method of Maximum Entropy (ME) as a technique to 
generate approximations to probability distributions. The central results 
consist in (a) justifying the use of relative entropy as the uniquely natural 
criterion to select a "best" approximation from within a family of trial 
distributions, and (b) to quantify the extent to which non-optimal trial 
distributions are ruled out. The Bogoliubov variational method is shown 
to be included as a special case. As an illustration we apply our method 
to simple fluids. In a first use of the ME method the "exact" canoni- 
cal distribution is approximated by that of a fluid of hard spheres and 
ME is used to select the optimal value of the hard-sphere diameter. A 
second, more refined application of the ME method approximates the "ex- 
act" distribution by a suitably weighed average over different hard-sphere 
diameters and leads to a considerable improvement in accounting for the 
soft-core nature of the interatomic potential. As a specific example, the 
radial distribution function and the equation of state for a Lennard- Jones 
fluid (Argon) are compared with results from molecular dynamics simu- 
lations. 

KEY WORDS: Variational method; maximum entropy; simple fluids 

1 Introduction 

A common problem is that our beliefs and our knowledge are often represented 
by a probability distribution P(x) that is too complicated to be useful for practi- 
cal calculations and we need to find a more tractable approximation. A possible 
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solution is to identify a family of trial distributions {p{x)} and select the mem- 
ber of the family that is closest to the exact distribution P(x). The problem, of 
course, is that it is not clear what one means by 'closest'. We could minimize 



but why this particular functional and not another? And also, why limit oneself 
to an approximation by a single member of the trial family? Why not consider 
some linear combination of the trial distributions, some kind of average over the 
trial family? But then, how should we choose the optimal weight assigned to 
each p{x)l And again, what does 'optimal' mean? We propose to tackle these 
questions using the method of maximum entropy. 

The method of Maximum Entropy as advocated by Jaynes (MaxEnt) is a 
method to assign probabilities on the basis of partial information The core 
of the MaxEnt method resides in interpreting entropy, through the Shannon 
axioms, as a measure of the "amount of uncertainty" or of the "amount of 
information that is missing" in a probability distribution. This was an important 
step forward because it extended the applicability of the notion of entropy far 
beyond its original roots in thermodynamics but it was not entirely free from 
objections. For example, one problem is that the Shannon axioms refer to 
probabilities of discrete variables; the entropy of continuous distributions was 
not defined. A second, perhaps more serious problem is that other measures 
of uncertainty were subsequently found. This immediately raised the question 
of which among the alternatives should one choose and motivated attempts 
to justify the method of maximum entropy directly as a method of inductive 
inference without referring to any specific measure of uncertainty [2]-[3- What 
eventually emerged is an extended form of the method of Maximum Entropy, 
which we abbreviate as ME to distinguish it from the original MaxEnt. The 
core of ME is a concept of (relative) entropy that reduces to the usual entropy 
in the special case of a uniform prior or uniform background measure and that 
does not require an interpretation in terms of heat, disorder, uncertainty, or 
even in terms of an amount of information: in the context of ME entropy needs 
no interpretation. 

The ME method is designed for processing information, for updating from 
a prior probability distribution (which in statistical mechanics is generally pos- 
tulated to be uniform) to a posterior distribution when new information in the 
form of constraints becomes available [8J. The chosen posterior distribution 
should coincide with the prior as closely as possible: ME implements the min- 
imal changes that are required to satisfy the new constraints. The significance 
of this development is that it allows one to tackle problems lying beyond the 
restricted scope of MaxEnt [9] • 

The main purpose of this paper is to extend the use of the ME method 
beyond processing information to a method to generate approximations [10] ; a 
second purpose is to illustrate the method by applying it to simple fluids. 

The general formalism, which is the main result of this paper, is developed 
in section [2] In section I2.1l we justify the use of relative entropy as the uniquely 
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natural criterion to select the best approximation. In section 12.21 as a simple 
illustration, the ME method is applied to trial distributions of the canonical 
Boltzmann-Gibbs form. The result, when expressed in terms of free energies, is 
recognized as the Bogoliubov variational principle [13] . which is now seen as a 
special case of ME [T3]. The first part of our paper concludes in section I2.3l with 
a quantitative analysis of the extent to which the distribution that maximizes the 
relative entropy is preferred over other members of the trial family. We show 
that our entropic argument does not completely rule out those distributions 
that fail to maximize the entropy. Therefore a suitably weighed average over 
the whole family of trial distributions with the optimal weight calculated using 
the ME method provides a better approximation than can be expected from 
any individual member of the trial family. 

The discussion up to this point is general - the ME method of generating 
best approximations is of general applicability - but it is useful to see how it 
is applied to a specific problem. In the second part of the paper we apply the 
ME method to simple classical fluids. The study of classical fluids is an old 
and mature field. There exist extensive treatments in many excellent books and 
reviews [15]- [17]. Our goal is to use this well-explored but still non-trivial field 
as a testing ground for the ME method. At this early stage in the development 
of the ME method we are not concerned with contributing to the study of the 
fluids themselves. Indeed, the reason we choose such a well-explored problem, 
is not because we are eager to find new results about fluids but because we 
can compare our results with those obtained using alternative approximation 
methods already in existence. We find that our results are quite competitive 
with those obtained using the best perturbative methods developed to date. 

The practical success or failure of the method hinges on choosing a family of 
trial distributions that incorporates the information that is relevant to the prob- 
lem of interest. In our example - simple fluids - it turns out that their structure 
is dominated, particularly at liquid densities, by the short-range repulsions be- 
tween molecules. The effects of the long-range attractions are averaged over 
many molecules and are less important, except at low densities. Accordingly 
we choose a fluid of hard spheres [H]-[T7] as our (almost) tractable trial model 
(section [3). The ME method is then used (section 2]) to select the optimal 
value of the hard-sphere diameter. This is equivalent to applying the Bogoli- 
ubov variational principle and reproduces the results obtained by by Mansoori 
et al. [18], and independently by Stell at al. [19]. 

A very different approximation scheme can be traced back to Lewis who used 
a maximum entropy argument to derive the Boltzmann equation [20j . and was 
further developed by Karkheck, Stell and coworkers [21]. Instead of selecting 
the optimal distribution from within a family of trial approximations, which is 
what wc do here, in their "kinetic variational theory" an entropy is maximized 
to determine the optimal closure for the BBGKY hierarchy of equations. Their 
method is suitable for tackling transport problems in dense simple fluids with 
potentials that are more realistic than just hard spheres. 

Alternative approaches to the study of fluids include perturbation theory. In 
such schemes the intcrmolccular potential u is written as uq + Su, where uo is a 
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strong short-range repulsion, which is eventually approximated by a hard-sphere 
potential Uhs, and Su is a long-range attraction treated as a perturbation. At 
high densities first order theory is quite accurate but at lower densities higher- 
order corrections must be included. There exist several proposals for how to 
separate u into uq and Su, for choosing the particular stage in the calculation 
where uq is replaced by Uhs, and also for choosing the best hard-sphere diameter. 
The most successful are those by Barker and Henderson [T5] and by Weeks, 
Chandler and Andersen (WCA) [22] , The latter succeeds in using the hard- 
sphere potential Uhs while effectively representing some of the effects of the 
soft-core potential uq. For a recent discussion of some of the strengths and 
limitations of the perturbative approach see [33] . 

An advantage of the variational and the ME methods over the perturbative 
approaches is that there is no need for ad hoc criteria dictating how to separate 
u into uq and Su, and how to choose a hard-sphere diameter. A disadvantage 
of the standard variational approach is that it fails to take the softness of the 
repulsive core into account, which leads, at high temperatures, to results that 
are inferior to the perturbative approaches. As we shall see below this limitation 
does not apply to the ME method. 

The standard variational approach allows one to select the optimal value 
of a parameter, in this case the hard-sphere diameter; all non-optimal values 
are ruled out. But, as discussed in [7J and |24) . the ME method allows one to 
quantify the extent to which non-optimal values should contribute. In this more 
complete use of the ME method (presented in section [5]) the "exact" probability 
distribution of the fluid is approximated by a statistical mixture of distributions 
corresponding to hard spheres of different diameters. This is a rather simple 
and elegant way to take into account the fact that the actual atoms are not 
hard spheres. The full ME analysis leads to significant improvements over the 
variational method. 

In section [6]we test our method by comparing its predictions for a Lennard- 
Jones model for Argon with the numerical molecular dynamics simulation data 
(|25j. pS]). We find that the ME predictions for thermodynamic variables and 
for the radial distribution function are considerable improvements over the stan- 
dard Bogoliubov variational result, and are comparable to the perturbative re- 
sults. Finally, our conclusions and some remarks on further improvements are 
presented in section [Jj 

2 Using ME to generate approximations 

The goal is to select from a family of trial distributions p{x) (the possible pos- 
teriors) that which is closest to a given "exact" distribution P(x) (the prior). 
The family of trials can be defined in a variety of ways. For example, one can 
specify a functional form pe{x), each member of the trial family being labelled 
by one or more parameters 0. More generally, one could define the trial family 
in a non-parametric way by specifying various constraints. 

The discussion below which develops the use of ME to generate approxi- 
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mations follows 7 where the ME method is developed for a different purpose, 
namely, as a method for processing information to update distributions. 

2.1 The logic behind the ME method 

The selection of one distribution from within a family is achieved by ranking the 
distributions according to increasing preference. Before we address the issue of 
what it is that makes one distribution preferable over another we note that there 
is one feature we must impose on any ranking scheme. The feature is transitiv- 
ity: if distribution p\ is preferred over distribution p 2 , and P2 is preferred over 
Ps, then pi is preferred over p 3 . Such transitive rankings are implemented by 
assigning to each p(x) a real number S[p] which we call the "entropy" of p. The 
numbers S[p] are such that if p\ is preferred over p 2 , then S[pi] > S[p2\- Thus, 
by design, the "best" approximation p is that which maximizes the entropy S[p\. 

Next we determine the functional form of S[p\. This is the general rule that 
provides the criterion for preference; in our case it defines what we mean by the 
"closest" or "best" approximation. The basic strategy [3] is one of induction: 
(1) if a general rule exists, then it must apply to special cases; (2) if in a certain 
special case we know which is the best approximation, then this knowledge can 
be used to constrain the form of S[p\; and finally, (3) if enough special cases are 
known, then S[p] will be completely determined. 

The known special cases are called the "axioms" of ME and they reflect the 
conviction that one should not change one's mind frivolously, that whatever 
information was originally codified into the exact P(x) is important and should 
be preserved. The selected trial distribution should coincide with the exact one 
as closely as possible and one should only tolerate those minimal changes that 
are demanded by the information that defines the family of trials. Three axioms 
and their consequences are listed below. Detailed proofs and more extensive 
comments are given in [7]. 

Axiom 1: Locality. Local information has local effects. If the constraints 
that define the trial family do not refer to a certain domain D of the variable x, 
then the conditional probabilities p(x\D) need not be revised, p(x\D) — P(x\D). 
The consequence of the axiom is that non-overlapping domains of x contribute 
additively to the entropy: S[p] = f dx F(p(x),x) where F is some unknown 
function. 

The motivation behind this axiom is the following. Suppose that the infor- 
mation, that is, the constraint, that defines the trial family, does not refer to a 
particular subdomain D of the variable x. This means that the probability of 
x conditional on its being in D is completely unsconstrained, and p(x\D) can 
take whatever value we desire. In other words, the family of trial distributions 
includes members that are capable of reproducing the conditional probability 
P(x\D) exactly. Axiom 1 says that if we can reproduce P(x\D) exactly then we 
should; that is, among the possible trials choose one such that p(x\D) — P(x\D). 
Clearly this is not just a good approximation, it is the best we can possibly do. 

Axiom 2: Coordinate invariance. The ranking should not depend on 
the system of coordinates. The coordinates that label the points x are ar- 
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bitrary; they carry no information. The consequence of this axiom is that 
S[p] = J dx p(x) f (p(x) / m(x)) involves coordinate invariants such as dxp(x) 
and p(x) I m(x) , where the function m(x) is a density, and both functions m and 
/ are, at this point, unknown. 

Next we make a second use of Axiom 1 (locality) . When there are no con- 
straints at all the family of trials includes the exact P{x) and the selected trial 
should coincide with P(x); that is, the best approximation to P(x) is P{x) itself. 
The consequence is that up to normalization the previously unknown density 
m(x) is the exact distribution P(x). 

Axiom 3: Consistency for independent subsystems. When a system is 
composed of subsystems that are believed to be independent it should not matter 
whether the approximation procedure treats them separately or jointly. Specif- 
ically, if x = (xi,X2), and the exact distributions for the subsystems, P\(xi) 
and P2(x2), are respectively approximated by pi(x\) and ^2(^2), then the ex- 
act distribution for the whole system P\(xi)P2(x2) should be approximated by 
Pi(xi)p2(x2)- This axiom restricts the function / to be a logarithm. 

The overall consequence of these axioms is that the trial approximations 
p(x) should be ranked relative to the exact P(x) according to their (relative) 
entropy, 

S\p\P] = -J dxp(x)\og^ y (2) 

The derivation has singled out the relative entropy «S[p|P] as the unique func- 
tional to be used for the purpose of selecting an optimal approximation. Other 
functionals, may be useful for other purposes, but they are not a generalization 
from the simple cases described in the axioms above. 



2.2 A special case: the variational method 

As an illustration consider a system with microstates labelled by q (for example, 
the location in phase space or perhaps the values of spin variables). Let the 
probability that the microstate lies within a particular range dq be given by the 
canonical distribution 

e -0H(q) 

P(q)dq = ~^—dq, (3) 

where 

Z = e-f> F = J dqe-^l (4) 

We want to approximate the "exact" P by a more tractable distribution p. The 
first step is to identify a family of trial distributions that are simple enough 
that actual calculations are feasible and that incorporates the appropriate rel- 
evant information. This step is difficult because there is no known systematic 
procedure to carry it out; it is a matter of trial and error guided by intuition. 
We will consider a family of trial distributions pg (q) that are canonical with a 
Hamiltonian Hg(q) that depends on parameters 9 = {9 1 , 9 2 , . . . , 9 n }, 

e -0H e (q) 

Pe(q)dq = — — dq, (5) 
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where 

Z e = e-f >F » = J dqe-P H °M. (6) 

The next step is to select the pg that maximizes S'[pe|P]. Substituting into 
eq.@ gives, 

S[pg\P}=[3((Hg-H)g-Fg + F) , (7) 

where (. . .)g refers to averages over the trial pg. The inequality 5[pe|P] < 0, 
can then be written as 

F<Fg + {H- Hg)g . (8) 

Thus, maximizing S'[pe|P] is equivalent to minimizing the quantity Fg + (H — 
Hg)g. This alternate form of the variational principle and its use to generate 
approximations is well known. It is usually associated with the name of Bogoli- 
ubov |13j and it is the main technique to generate mean field approximations 
for discrete systems of spins on a lattice. What is perhaps not as widely known 
is that the Bogoliubov variational principle is just the special case of applying 
the ME method to trial distributions of the canonical Boltzmann-Gibbs form. 



2.3 To what extent are non-optimal distributions ruled 
out? 

The example above does not exhaust the power of the ME method: we have 
found a way to determine the optimal choice of 9 but ME allows us to go 
further and quantify the extent to which the optimal 9 is preferred over other 
non-optimal values ([7], 

To what extent do we believe that any particular 9 should have been chosen? 
This is a question about the probability of 9, p(9). The original problem of 
assigning a probability to q is now broadened into assigning probabilities to q 
and 9. Here, we use ME not just to find the optimal approximation p(q) but 
also to find the optimal joint distribution p(q, 9). 

Notice that this is the kind of problem where it is necessary to adopt a 
Bayesian interpretation of probabilities. Within a frequentist interpretation it 
makes no sense to talk about p{9) or p(q, 9) because 9 is not a random variable; 
the value of 9 is unknown but it is not random. 

To proceed we must address two questions. First, what is the prior dis- 
tribution, that is, what do we know about q and 9 before we are given the 
constraints? And second, what are the constraints that define the family of 
joint trials p(q, 9)1 

When we know nothing about the 9s we know neither their physical meaning 
nor whether there is any relation to the q. A joint prior m(q, 9) that reflects this 
lack of correlations is a product, m(q, 9) = P(q)fi(9), where P(q) is the "exact" 
distribution, say eq.((3]), and fi(9) should reflect our ignorance about 9: it should 
be as uniform as possible and make every volume element in 9 space as likely 
as any other. But if we know absolutely nothing about 9 we also do not know 
how to measure volumes in 9 space. 
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To make further progress we need the additional information that provides 
meaning to the 9s, namely, that they are parameters labeling the family of 
distributions pe(q). Remarkably this is sufficient to allow us to introduce a 
measure of distance in 9 space that is both natural and unique: we define the 
distance between 9 and 9 + d9 to be the same as the distance between the 
corresponding pg and pe+de which is given by the Fisher-Rao metric d£ 2 = 
-fijd9 l d9 : > , where 

f d log p e {q) d log p e (q) 
lij= j dqp e (q) —. w —. (9) 

This is the only Riemannian metric that takes proper account of the fact that 
the 9s are not just structureless points but represent probability distributions 
[27] , Accordingly, the volume of a small region d9 is r y 1 ^ 2 (9)d9, where j(9) is 
the determinant of 7^. Up to an irrelevant normalization, the distribution fj,(9) 
that is uniform in 9 is given by ijl{9) — 7 1 / 2 (#). 

The second question about the constraints that define the family of trials is 
straightforward: of all joint distributions p(q, 9) — p(9)p(q\9) we are only inter- 
ested in the subset of those distributions such that p(q\9) = pe{q). Therefore, 
the trials p(q, 9) are constrained to be of the form p(q, 9) = p(9)pg(q). 

Now we allow the ME method to take over: the best approximation p(q, 9) 
to the joint distribution P{q)^f 1 / 2 {9) is obtained by maximizing the entropy 

a\p{q,e)\l 1/2 P] =~ j dqd9p(9)p g (q) log -^gM_ ; (10) 

by varying p(9) subject to J d9p(9) = 1. The final result for the probability 
that 9 lies within the small volume j 1 / 2 (9)d9 is 

P (9)d9 = J e s ^ p h 1/2 (0)d9 , (11) 

where £ is a normalization constant. 

Eq. fTTj) tells us that the preferred value of 9 maximizes the entropy 5[pe|P], 
Eq.®, because this maximizes the probability density exp S[pe\P]. It also tells 
us the degree to which values of 9 away from the maximum are ruled out. For 
macroscopic systems the preference for the ME distribution can be overwhelm- 
ing. Note also that the density exp S[pg\P] is a scalar function and the presence 
of the Jacobian factor 7 1 ^ 2 (#) makes Eq. flTTj) manifestly invariant under changes 
of the coordinates 9. 

Finally, now that we have determined the joint distribution p{q, 9) — p(9)pg{q) 
we can marginalize 9 and use the average 

p{q) = J d9p{9)p e {q) (12) 

as the best approximation we can construct out of the given trial family. This 
approximation is expected to be better than any individual pg (q) for the same 
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reason that the mean is expected to be a better estimator than the mode - it 
minimizes the variance. 

This idea of introducing mixtures of probability distributions as in eq. (|12p 
might seem strange at first sight but it is actually quite natural. For example, 
if we know that a system is in thermal equilibrium at temperature T then we 
describe its macrostate using the canonical distribution. But if we are uncertain 
about the actual value of the temperature then a better description is given by 
a suitable weighted average over T. Eq. fTTj) gives the appropriate weights. 

The procedure above is mathematically straightforward but the shift from 
the original problem of assigning a probability to q into the new problem of 
assigning probabilities to q and 6 can be a potential source of confusion. It might 
appear that the maximization of the two entropies S in eq.Q and a in eq. (|10p 
has lead to two different best approximations: one is pe(q) with 8 maximizing 
eq. ((7| , and the other is p(q) in eq. fl2]) . How can we have two different answers to 
the same question? The answer is that we actually have two different questions. 
Maximizing S answers the question: "What is the best single pe(g)? Or, what 
is the best approximation obtainable in terms of a single trial?" Maximizing a 
answers a different question: "What is the best joint distribution p(q,8)7 Or, 
equivalently, what is the best approximation when we are not restricted to a 
single trial?" 

This concludes the first part of our paper. To summarize: our main results 
consist in the justification of the relative entropy eq.® as the uniquely natural 
functional to select the best approximations and the derivation of a quantitative 
measure of the degree to which the various trials are preferred, eq. (fTT|) . The 
final result for the best approximation is eq.(|12|). 

Next we illustrate how this abstract formalism is used in a specific example. 



Here we collect some necessary background material on simple fluids. 

We consider a simple fluid composed of N single atom molecules described 
by the Hamiltonian 



where gjy = {Pii^i, i = lj ■■■jN} and the many-body interactions are approxi- 
mated by a pair interaction, it(rjj) where = \n — rj\. The probability that 
the positions and momenta of the molecules lie within the phase space volume 
dqN is given by canonical distribution, and 



3 Simple fluids 




(13) 



P(qN) dq N 



L e -0H{1N) dq 



(14) 



where 



1 



N 



dq N 



Nl h 3N 



Y[ d 3 pid 3 n 



(15) 
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and 

Z= [ dq N e -Ww). (1 6 ) 



For fluids dominated by pair interactions most thermodynamic quantities of 
interest can be written in terms of the one- and two-particle density distributions 



where 



and 



n(r) = (n(r)) and n^ 2 \ri,r 2 ) — {rS 2 \ri,r 2 )) (17) 
h(r)=J2 6 ( r - r i) (18) 



n ~'(ri,r 2 ) = ^ 5(ri - ri)8(r 2 ~ r ) . (19) 



It is convenient to introduce the two-particle correlation function 



g(ri,r 2 ) = 7-r-T-r , (20) 

which measures the extent to which the structure of liquids deviates from com- 
plete randomness. If the fluid is homogeneous and isotropic n(r) = p = N/V 
and g(ri,r 2 ) — g(\fi — r 2 1) = g(r) where p is the bulk density and g(r) is the 
radial distribution function (RDF). Then, the pressure is given by 

PV 0p f 3 du(r) 

= 1 -^r / d rr —n-a(.r) , (21) 



Nk B T 6 J dr 

where d = l/k B T. 

The difficulty, of course, is that it is very difficult to calculate g(r) from 
the "exact" distribution Eq. (|14p and we need to find an approximation that is 
calculable and still includes the two features of the interaction potential u that 
are relevant for explaining most fluid properties: the strong repulsion at short 
distances and the weak attraction at long distances. 

To account for the short-distance repulsion we consider a family of trials 
composed by distributions that describe a gas of hard spheres of diameter r^. 
For each the Hamiltonian is 

N 2 

H hs (q N \r d ) = y ^ + U hs (22) 
* — ' Ira 

i=l 

with 

N 

U hs = ^u hs (r i: j\r d ) , (23) 

i>j 

where 

/ 1 \ JO for r > rd / .x 
u hs (r\r d ) = i ^ for r<rd (24) 
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and the corresponding probability distribution is 

Phs(q N \r d ) = J- e -/"WwN) . (25) 

^hs 

The partition function and the free energy Fh s (T, V, N \r d ) are 

Z hs = ( dq N e ~PH M r d ) d|f e -0F hB (T,V,N M . (26) 



Two objections that can be raised to the choosing Phs(qN\rd) as trials are, first, 
that they do not take the long-range interactions into account; and second, 
that the actual short range potential is not that of hard spheres. These are 
points to which we will return later. A third objection, and this is considerably 
more serious, is that the exact hard-sphere RDF is not known. However, it can 
be calculated within the approximation of Percus and Yevick (PY) for which 
there exists an exact analytical solution ([IE], which is reasonably simple 
and in good agreement with numerical simulations over an extended range of 
temperatures and densities, except perhaps at high densities. There are several 
successful proposals [30] to improve upon the PY RDF but they also represent an 
additional level of complication. We feel that the simpler PY RDF is sufficiently 
accurate for our current objective - to illustrate the application and study the 
broad features of the ME approach. We will therefore assume that for our 
purposes the Ph s are sufficiently tractable distributions. 

The PY RDF can be written in terms of the Laplace transform of rgh s {f \r d ) 



G(s) = J dx xg hs (xr d \r d )e~ 
o 

sL{s) 



1277 [L(s) + M(s)e s ] ' 
where x is a dimensionless variable x = r/rd, 



L(s) = 1277 



1 + ^77) .s ! (] - 2,i) 



(27) 



(28) 



M{s) = (1 - 77) 2 s 3 + 677(1- 77) s 2 

+ 18?7 2 s- 1277 (I + 277), (29) 

and 77 is the packing fraction, 

dcf 1 3 .,, N 
V = -^pr d with p= — ■ (30) 

The RDF ghs(r \rd) is obtained from the inverse transform using residues |31j . 
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The equation of state can then be computed in two alternative ways, either 
from the "pressure" equation or from the "compressibility" equation but, since 
the result above for ghs{r\rd) is not exact, the two results do not agree. It 
has been found that better agreement with simulations and with virial coeffi- 
cients is obtained taking an average of the two results with weights 1/3 and 2/3 
respectively. The result is the Carnahan-Starling equation of state, [H]-[T7] 

( PV\ _ 1 + 7/ + i] 2 - t? 3 

\Nk B T) hs - (!_„)» • (31) 
The free energy, derived by integrating the equation of state, is 

4r/ — 3rj 



F hs (T,V,N\r d )=Nk B T 



-1 + ln/jA J 



where A = (27r?i 2 /mfcsT) 1 / 2 , and the entropy is 



(32) 



S - — (^) w -^ + 5 m - (33) 

It must be remembered that these expressions are not exact. They are reason- 
able approximations for all densities up to almost crystalline densities (about 
r\ w 0.5). However, they fail to predict the face-centered-cubic phase when r\ is 
in the range from 0.5 up the close-packing value of 0.74. 



4 The optimal hard-sphere diameter 

As discussed in section [5J the trial PhsiflNV d) that is "closest" to the "exact" 
P(qn) is found by maximizing the relative entropy, 

S [P hs \P] = -Jdq N P hs {q N \r d ) log < 0. (34) 

Substituting Eqs. (Ti"4"|) and ([23]) we obtain 

S [P hs \P] =P[F-F hs -{U - U hs ) hs ] < , (35) 



where (• • ■ is computed over the hard-sphere distribution PhsiqN^d)- Eq. (|3"5f 
can be rewritten as 

F < F v d = F hs + (U- U hs ) hs , (36) 

which shows that maximizing S [Ph s \P] is equivalent to minimizing Fjj over all 
diameters r d . Thus, the variational approximation to the free energy is 

F (T, V, N) w Fu (T, V, N | r m ) d = ruin F v (T, V, N \r d ) , (37) 
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where r rn is the optimal diameter. 
To calculate Fu use 

(U-U hs ) hs = II d 3 rd 3 r' n$(rS) 

[u(r - r') - u hs (r - r'\r d )] , (38) 

where n^(r,r') = (f^ 2 \r,r'))h s . But U] ls (r — r'\r<i) = for \r — r'\ > r d while 

f 21 

n h J(r,r') = for \r — r'\ < Td, therefore 

F v = F hs + (U) hs (39) 

with 

(U) ha = \Np J d 3 ru(r)g hs (r\r d ), (40) 
where we have assumed that the fluid is isotropic and homogeneous, n h J (r, r') — 

(2) 

n hs (l r — r 'l)> an< ^ introduced the hard-sphere radial distribution function 



9hs{r \r d ) 



def nfiW 

P 2 ' 



(41) 



Notice that the approximation does not consist of merely replacing the exact 
free energy F by a hard-sphere free energy Fh s which does not include the 
effects of long range attraction; F is approximated by Fu{r m ) which includes 
attraction effects through the (U)hs term in eq.(|39|). This addresses the first 
of the two objections mentioned earlier: the real fluid with interactions given 
by u is not being replaced by a hard-sphere fluid with interactions given by 
Uhs ; it is just the probability distribution that is being replaced in this way. 
The internal energy is approximated by (H)^ = ^NkgT + {U)h s an d not by 



{Hhs)hs 



\Nk B T. 



To calculate (U)h s it is convenient to write it in terms of V(s), the inverse 
Laplace transform of ru(r), 



(xr d ) — ds V(s)e~ 



(42) 



For example, for a Lennard-Jones potential, 

12 



u(r) = 4e 



(- 



we have 



V(s) = 4e 



12 ^ 

10! 



a\ 6 
r ) 
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V 

4! 



(43) 



(44) 
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Then, using eq. ([3"5)) and eq. ([2"T)) gives 



(U) hs = 12N V J ds V(s)G(s). (45) 
o 

Finally, it remains to minimize Fjj in eq. (|39p to determine the optimal diameter 
r rn . This is done numerically; an explicit example for Argon is calculated in 
section [5] 

As mentioned earlier the problem with the approach outlined above is that 
it fails to take the softness of the repulsive core into account. This flaw is 
manifested in a less satisfactory prediction of thermodynamic variables at high 
temperatures, and also, as will be shown by the numerical calculations in section 
[51 in a poor prediction of the radial distribution function. 



5 Beyond the Variational Principle 

The ME method as pursued in the last section has led us to determine an 
optimal value of the hard-sphere diameter. Next we consider whether the correct 
choice should have been some other value r e i — r m + Sr rather than the optimal 
fd = r m . As discussed in section 12.31 this is a question about the probability 
of rd, Pd(fd)- Thus, we are uncertain not just about but also about Td and 
what we actually seek is the joint probability of qw and r^, Pj(qN ,fd). Once 
this joint distribution is obtained our best assessment of the distribution of q^ 
is given by the marginal over Td, 

Phs(qN) d = I dr d Pj(qN,r d ) 

dr d Pd{r d )Phs{qN\r d ). (46) 

By recognizing that diameters other than r m are not ruled out and that a 
more honest representation is an average over all hard-sphere diameters we are 
effectively replacing the hard spheres by a soft-core potential. 

However, we should emphasize that the distribution over the hard-sphere 
diameters Pd{rd) is not being introduced in an ad hoc way in order to account 
for the softness of the LJ potential. The introduction of p(Q) in general (section 
12. 3p . and of Pd(rd) in particular, are mandated by the ME method. The distri- 
bution p{9) would also arise if one were trying to find an ME approximation to 
other problems which do not involve hard spheres at all. 

The distribution of diameters is given by eq. (|lip 

pS[P ha \P] , p -0Fu 

Pd(r d )dr d = 7 1/2 (r d ) dr d = ——J 1/2 (r d ) dr d , (47) 

where S [P) ls \P] = P (F — Fu) > the partition functions ( and C,u are given by 
C = e^Cu with Cu = [ dr d 7 1/2 M e-^, (48) 
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and the natural distance dl 2 = n i(r d )dr 2 d in the space of r d s is given by the 
Fisher-Rao metric, 

/ \ f, D , , n fd\ogP hs {q N \r d )\ 2 
l( r d) = J dq N P hs (q N \r d ) \ — I . (49) 

The remaining problem in the above equations is the calculation of the Fisher- 
Rao measure 7 1 / 2 and this is conveniently done by considering the entropy of 
PhsilN ) relative to Phs(qN \ r d)- A straightforward differentiation shows that 



d 2 S[P hs (-\r' d )\P hs (-\r d )} 



dr'l 



l(r d ). (50) 



Substituting the distributions Phs{q_N \r' d ) and Ph s (qN \r d ) gives 

S[P h Mr' d )\P hs (Au)}=p\F hs a-(U h ^f)J , (51) 

where (• ■ • ) r < is the average over Ph s {qN\r d ). As we argued above eq.j39|) the 
expectation of the potential energy (Uhs ( r ' d ))r' d vanishes because the product 
u ( r \ r 'd)9hs( r \ r 'd) vanishes for both r < r' d and r > r' d . Similarly, (Uhs ( r d))r' d = 
when r' d > r d . However, when r' d < r d the expectation (Uhs (r d )) r ' d diverges, 
S is not defined and eq. ([50"]) is not applicable. We can argue our way out 
of this quandary by pointing out that the divergence is a consequence of the 
unphysical idealization involved in taking a hard-sphere potential seriously. For 
more realistic continuous potentials the distance between r' d = r d + dr d and r d 
is the same as the distance between r' d = r d — dr d and r d . We can then always 
choose r' d > r d and define 7(7^) in eq. ([50]) as the limit r d = r d + + . Then, 
using eq. ([5^)l . we have 



8r' d 



=r d +0+ 



4 + 977 - 4?y 2 

= Nirpr d — -j— . (52) 

(1-7?) 

To summarize, the distribution of diameters P d (r d ) is given by eq. (|47p with 
Fjj given by Eqs. ([3^1 [521 r45|) and 7 given by ([52")) . Our best approximation 
to the "exact" P(^at) is the Phs(qN) given in eq.(0H]). The corresponding best 
approximation to the radial distribution function is 

9hs(r) = J dr d P d {r d )g hs (r \r d ) . (53) 

Since ghs(r) takes into account soft-core effects while <7/ ls (r|r m ) does not, 
we expect that it will lead to improved estimates for all other thermodynamic 
quantities. However, there is a problem. Since the free energy Fjj is an extensive 
quantity, Fjj oc N, for large N the distribution Pd(r d ) ~ exp—f3Fjj is very 
sharply peaked at the optimal diameter r m . This result must be interpreted 
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with care: when choosing a single optimal diameter for a macroscopic fluid 
sample we find that ME confers overwhelming probability to the optimal value. 
This is not surprising. The same thing happens when we calculate the global 
temperature or density of a macroscopic sample: standard textbook results 
predict that fluctuations about the expected value are utterly negligible. And 
yet fluctuations can be important. For example, for small fluid samples, or when 
we consider the local behavior of the fluid, fluctuations are not merely observable 
but can be large. Local fluctuations can be appreciable while global fluctuations 
remain negligible. The question then, is whether these local fluctuations are 
relevant to the particular quantities we want to calculate. We argue that they 
are. 

The radial distribution function g(r) is the crucial quantity from which all 
other thermodynamic variables are computed. But from its very definition as 
the probability that given an atom at a certain place another atom will be found 
at a distance r, it is clear that g(r) refers to purely local behavior and should 
be affected by local fluctuations. To the extent that the optimal r m depends 
on temperature and density we expect that local temperature and/or density 
fluctuations would induce local diameter fluctuations as well. 

The extended analysis in this section does not yet allow us to pursue the 
question of local diameter fluctuations in a satisfactory manner. As we men- 
tioned earlier the ME method is quite rigid in that the only freedom it allows 
is the choice of trial distributions. A proper analysis of diameter fluctuations 
would require enlarging the family of trial distributions to allow for spatial 
inhomogeneities in the diameters of the hard spheres. It may be worthwhile 
spelling out one possible such enlargement. We could imagine a trial model 
where the molecules are hard spheres with a diameter that depends on their 
location rd(r)- As a molecule moves around its diameter shrinks and expands 
according to a prescribed function r^r) which thus acquires the character of an 
external "field". To each possible choice of the diameter field r<j(f) there cor- 
responds one trial distribution. This means that instead of a one-dimensional 
family of trials labelled by the single parameter we would have to deal with 
an infinite-dimensional family labeled by the fields r<j(r). One should remark 
that these trial distributions do not describe any "physical system but this in 
itself is not a problem. The ME method does not attempt to approximate one 
physical system by another physical, albeit idealized system; it just attempts to 
approximate one mathematical distribution by another; there is no requirement 
that the latter be interpretablc in terms of physically realizable Hamiltonians. 
The real problem, of course, is that these inhomogeneous trial models are not 
(at present) sufficiently tractable. However, we could divide the fluid into meso- 
scopic cells and consider trial models where the diameters r^i) are uniform 
within each cell i, which allows a local Percus-Yevick approximation. The ME 
method would then be applied to determine not only the distribution of diame- 
ters within each cell but also the optimal size of the cells. This is a development 
we plan to pursue in future work. 

For the purpose of this paper, however, we can quickly estimate the effects 
of local fluctuations by pointing out that the size of the region (i.e., the size of 
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the cell) that is locally relevant to the calculation of g(r) contains an effective 
number of atoms N c g that can be estimated directly from a feature of the exact 
RDF that turns out to be known. (What we are doing is making the best use 
of information that happens to be available, which is quite in the spirit of our 
information theory-maximum entropy approach.) 

The basic idea is intuitively simple: at very short distances the form of the 
true, exact RDF g (r) is dominated by the repulsive part of the potential. If 
the size of the molecule is given by the Lennard- Jones parameter a, eq. (|4"3"|) , the 
asymptotic form of g(r) is given by 

g (r) -» e- pu{r) for r < a . (54) 

For a sufficiently dilute gas g (r) « e"' 3 "^ holds for all distances r; for dense 
fluids eq. ([54|) is valid only for r — > 0. (This follows from a clever trick due 
to Percus which allows one to write an exact expression for the two-particle 
distribution function for a fluid in terms of the single particle density of the 
same fluid placed in a suitable external potential pUj.) A fluid of hard spheres 
gives g(r\rd) = for r < r<j and cannot reproduce the behavior {51J) . However, 
once we recognize that we can use a statistical mixture, eq. (|46[) . we can tune 
the size N e g of the cell and thereby change the width of Pd (ra) so that the 
radial distribution function g^ (r) of (|53[) reproduces the known short-distance 
behavior. This we proceed to do in the next section. 

6 An example: Lennard-Jones "Argon" 

One of the difficulties in testing theories about fluids against experimental data 
is that it is not easy to see whether discrepancies are to be blamed to a faulty 
approximation or to a wrong intermolecular potential. This is why theories 
are normally tested against molecular dynamics numerical simulations where 
there is control over the intermolecular potential. In this section we compare 
ME results against simulation results [55] for a fluid of monatomic molecules 
interacting through a Lennard-Jones potential, eq. (|4*5]) . The parameters e and 
a (the depth of the well, u\ m - m = and the radius of the repulsive core, 
u(a) = 0, respectively) are chosen to model Argon: e — 1.03 x 10 -2 eV and 
a = 3.405. A 

6.1 The free energy Fu 

Figure [TJ( A) shows the free energy Fjj/NksT as a function of hard-sphere 
diameter for Argon at a fixed density of pa 3 — 0.65 for different temperatures. 
Figure [T](B) shows Fu/NksT as a function of for several densities at fixed 
T = 107.82 K. Since the critical point for Argon is at T c = 150.69 K and p c a 3 = 
0.33 all these curves, except that at 300 K, lie well within the liquid phase. 
The increase of Fjj /NksT for high values of is due to short range repulsion 
between the hard spheres described by Fh s /NkBT. The increase for low is 
due to the Lennard-Jones short-range repulsion as described by (U)hs/NkBT. 
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The best is that which minimizes Fjj and depends both on temperature 
and density. The best diameter decreases as the temperature increases because 
atoms with higher energy can penetrate deeper into the repulsive core. The 
dependence with density is less pronounced. 
(A) ~ (B) 




-8 I 

2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0 4.2 4.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0 4.2 4.4 

r/A) r d (A) 



Figure 1: (A): The free energy Fjj as a function of hard-sphere diameter for 
Argon at a density of pa 3 — 0.65 for different temperatures. The best Td is that 
which minimizes Fjj. (B): Fjj as a function of for Argon at T — 107.82 K for 
different densities. 



6.2 The distribution of diameters Pd{r d ) 

In section [5] we argued that the effective number of molecules that is relevant 
to the local structure of the fluid is not the total number of molecules in the 
system AT, but a smaller number, N c g. In Fig. [2](A) we plot the distribution of 
diameters Pd{rd) for different temperatures, for a fixed fluid density of pa 3 = 
0.65, and for an arbitrarily chosen N c g = 13500. As expected the distribution 
shifts to higher diameters as the temperature decreases. Notice also that the 
distribution becomes narrower at lower temperatures in agreement with the fact 
that a hard-sphere approximation is better at low T [15] . 

Figure [21(B) shows that increasing N c s (with fixed density p) decreases the 
width of Pd(rd) (solid lines) and induces a slight shift of the whole distribution. 
This is due to the dependence ~ (Ncgrd) 1 ^ 2 of the Fisher-Rao measure 7 1 / 2 (r<j) 
in Eq. (f52")) . Figure[3](B) also explores the influence of 7 1 / 2 (jd) by comparing the 
actual distributions Pd{rd) (solid lines) with the distributions e~P Fv ^ rd ) (dotted 
lines) which are obtained by setting 7 1 / 2 = 1 in Eq. (jT7)) . The effect of 7 1 / 2 is 
to shift the distribution slightly to higher r&- 

6.3 The radial distribution function 

We are finally ready to calculate the radial distribution g(r) for Argon. We 
start by estimating the number of molecules N D g that are locally relevant; as 
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Figure 2: (A): The distribution of hard-sphere diameters for Argon for several 
temperatures at density pa 3 = 0.65 for N e g — 13500. (B): Pd (r^) for various 
N cS at T = 107.82 K and pa 3 = 0.65. By setting 7 1 / 2 = 1 (dotted lines) we see 
that the effect of the 7 1 / 2 factor is to cause a slight shift of the distribution. 



explained earlier we choose Neg so that our best approximation §h s (r) , Eq. (|53|) , 
reproduces the known short-distance behavior e~^ utyr \ We have found that the 
estimates for N c g need not be very accurate but that they must be obtained 
for each value of the temperature and density. In Fig. [3] we show an example 
of the short-distance behavior of §h s for three values of N e g at T = 107.82 K 
and pa 3 = 0.65; using a chi-square fit in the range from r = 2.9 to 3.1 A the 
selected best value of N e s is around 38000. 




Figure 3: Estimating N c g by requiring that ghs{f) have the correct short-distace 
behavior e~^ r \ 

In Figs. |U(A)-(D) we compare three different ways to calculate the RDF. 
The solid line is Ver let's molecular dynamics simulation [25]; it plays the role 
of experimental data against which we compare our theory. The dotted line 
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is ghs(r\r m ) for the hard-sphere fluid with optimal diameter r m . This curve, 
calculated from eg. ([27]) , is also the result of the variational method and coin- 
cides with the ME result for a macroscopically large N c g = N. The dashed 
line is the averaged ghs{r) of the extended ME analysis. Figs. H1(A)-(C) 
were plotted at three different temperatures T = 107.82, 124.11 and 189.76 
K at the density pa 3 = 0.65. Fig. |H(D) we changed the density and the tem- 
perature to pa 3 = 0.5 and T = 162.93 K. The agreement between the ME curve 
and Verlet's data is good. The vast improvement over the simpler variational 
method calculation is clear. 

(A) (B) 





Figure 4: The radial distribution function for (a) the hard-sphere fluid with 
optimal diameter r m ; (b) Verlet's molecular dynamics simulation; and (c) the 
improved ME analysis, for Argon at (A): density pa 3 — 0.65, temperature 
T = 107.82 K, and effective particle number N eS = 38000. (B): pa 3 = 0.65, 
T = 124.11 K, and N cS = 40000. (C): pa 3 = 0.65, T = 189.76 K, and iV c ff = 
50000. (D): pa 3 = 0.5, T = 162.93 K, and A^ off = 62000. 

One might be tempted to dismiss this achievement as due to the adjustment 
of the parameter jV e ff but this is not quite correct: N e g has not been adjusted, it 
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has been calculated on the basis of information that is actually available. Indeed, 
despite the fact that the hard-sphere trial solutions that we employ are mere 
approximations, the functional form of the whole curve ghs(f) m eq. (|53[) is 
reproduced quite well. 

6.4 The equation of state 

Finally we use the RDF to calculate the equation of state from the pressure 
equation, Eq. (|2"Tj) . In Fig. O we compare the equation of state derived from 
the g(r) obtained from Verlet 's simulation with calculations using the ME and 
variational methods and the perturbative theories of Barker and Henderson [15j 
and of Weeks, Chandler and Anderson [22], at T = 161.73 K. The ME results 
constitute a clear improvement over the plain variational calculation. For low 
densities all four methods agree with each other but differ from the simulation. 
A better agreement in this region would probably require a better treatment of 
two-particle correlations at long distances. At intermediate densities the best 
agreement is provided by the ME and BH results, while the WCA theory seems 
to be the best at high densities. Also shown in Fig. E](A) are experimental data 
on Argon [33] . The discrepancy between the experimental curve and the Verlet 
simulation is very likely due to the actual potential not being precisely of the 
Lennard- Jones type. 

In Fig. 0(B) we plot the ME equation of state for three different isotherms 
(T = 137.77, 161.73 and 328. 25K). To compare to the simulation of Hansen and 
Verlet [23] we plot f3P (rather than f3P/ p) as a function of density pa 3 because 
this kind of plot exhibits the characteristic van der Waals loop that signals the 
liquid-gas transition as the temperature drops. A more exhaustive exploration 
lies, however, outside the scope of this paper. 

7 Conclusion 

The goal of this paper has been to use the ME method to generate approxima- 
tions and show that this provides a generalization of the Bogoliubov variational 
principle. This addresses a range of applications that lie beyond the scope of the 
traditional MaxEnt. To test the method we considered simple classical fluids. 

When faced with the difficulty of dealing a system described by an intractable 
Hamiltonian, the traditional approach has been to consider a similar albeit ide- 
alized system described by a simpler more tractable Hamiltonian. The approach 
we have followed here departs from this tradition: our goal is not to identify 
an approximately similar Hamiltonian but rather to identify an approximately 
similar probability distribution. The end result of the ME approach is a proba- 
bility distribution which is a sum or an integral over distributions corresponding 
to different hard-sphere diameters. While each term in the sum is of a form that 
can be associated to a real hard-sphere gas, the sum itself is not of the form 
exp —(3H, and cannot be interpreted as describing any physical system. 
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Figure 5: (A): The Argon equation of state calculated using the ME method, the 
variational method and the perturbative theories of BH and WCA are compared 
to the Verlet simulation at T = 161.73 K. Also shown are Levelt's experimental 
results. (B): (3P versus the reduced density per 3 calculated using the ME method 
(solid line) and compared to the Hansen- Verlet simulation for three different 
isotherms. The graph shows the appearance of the liquid-gas van der Waals 
loop as the temperature drops. 

As far as the application to simple fluids is concerned the results achieved 
in this paper represent progress but further improvements are possible by us- 
ing better approximations to the hard-sphere fluid and by choosing a broader 
family of trial distributions. We argued that an important improvement would 
be achieved if one could extend the trial probability distributions to include 
inhomogencitics in the hard sphere diameters. This would lead to a systematic, 
fully ME method for the determination of the effective number of particles N e g 
that are locally relevant. 

Many perturbative approaches to fluids had been proposed, and a gradual 
process of selection over many years of research led to the optimized theories 
of BH and WCA. The variational approach was definitely less satisfactory than 
these "best" perturbation theories. With our work, however, the situation has 
changed: the ME-improved variational approach offers predictions that already 
are competitive with the best perturbative theories. And, of course, the poten- 
tial for further improvements of the ME approach remains, at this early date, 
far from being exhausted. 
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